Optimized gene editing utilizing a recombinant endonuclease system

ABSTRACT

Described herein are methods and compositions for genomic editing. Endonucleases for genomic editing involve inducing breaks in double stranded DNA, for which knock-ins are notoriously inefficient for relying on random integration of homologous DNA sequences into the break site by repair proteins. To address these issues, described herein are novel recombinant fusion proteins that actively recruit linear DNA inserts in closer proximity to the genomic cleavage site, increasing integration efficiency of large DNA fragments into the genome. Such improvements to genomic editing technology allow one to use lower linear DNA concentrations without sacrificing efficiency and can be further combined with other features, such as fluorescent protein reporting systems.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit under 35 U.S.C. § 120 as a divisionof U.S. patent application Ser. No. 15/573,732 filed Nov. 13, 2017, nowU.S. Pat. No. 11,535,871, which is a National Phase of InternationalApplication No. PCT/US2016/032367 filed May 13, 2016, which designatedthe U.S. and that International Application was published under PCTArticle 21(2) in English, which also includes a claim of priority under35 U.S.C. § 119(e) to U.S. provisional patent application No. 62/161,487filed May 14, 2015. The contents of each of the above applications areherein incorporated by reference.

REFERENCE TO SEQUENCE LISTING

This application contains an ST.26-compliant XML sequence listingsubmitted as an electronic file named“065715_000063US00_SequenceListing.xml”, having a size in bytes of21,668 bytes, and created on Mar. 20, 2023 (production date noted as2023-03-20), which replaces the ST.25-compliant sequence listing file inthe priority applications no. PCT/US2016/032367 and 15/573,732 and doesnot add new matter. The ST.26-compliant XML sequence listing is herebyincorporated by reference in its entirety.

FIELD OF THE INVENTION

Described herein are methods and compositions for use in in vitro and invivo manipulation of genetic sequences for research and therapeuticactivities, including generation of knockout and knock-in sequences forgenomic editing.

BACKGROUND

CRISPR-Cas genome editing has become a more mature technique within thelast 10 years. Like all other genome editing techniques, it is adequateat generating knockout phenotypes, but not very efficient at introducingknock-in of donor DNA sequences. The development of more useful enzymesfor biotechnology and primary research has included improving efficacyof enzymes, creating new enzymes to fill needs, or adapting them toexpand their functionality. For example, Klenow exonuclease minus enzymeis a truncation of the Klenow large-fragment. This modified DNApolymerase retains its polymerase activity while losing thelarge-fragment's ability to chew back 3′ DNA overhangs. Another exampleincludes PHUSION® DNA polymerase, a proprietary fusion proteinconsisting of a proofreading DNA polymerase plus the domain of anotherDNA polymerase that confers higher copying rate. Extending theseprinciples to the realm of genome editing, it should be possible tocreate a new Cas-9 protein that preserves the existing DNA endonucleaseactivity while increasing the integration efficiency of donor DNAsequence into the genome. Several strategies are proposed to addressthis need.

There are several obstacles in particular that concern insertion ofdonor DNA into the genome. Non-homologous end joining (NHEJ), while anappropriate method for knocking out genes, is wholly insufficient forintegration of donor DNA without the presence of any other mutations.Homology directed repair (HDR) encompasses at least two forms: a longsequence homology arm requiring Holliday structures to resolveintegrations, or short homology arms which rely on strand invasion andthe emergency DNA repair system to resolve the integration events. Anycombination of HDR and NHEJ will predominantly yield indel (insertionand deletion) mutations within the portion of the damaged DNA resolvedby NHEJ. In essence, stochastic processes determine the possibleintegration of homologous DNA and plague efficient genomic editing forknock-ins, including the inversely proportional relationship between thelength of the donor DNA fragment and the efficiency of integration,integration efficiency of donor DNA is directly proportional to thedonor DNA concentration in the cell and cytotoxic overabundance oflinear DNA fragments, and spatial restriction of the two free ends of adouble stranded break therefore causing a most likely outcome ofre-annealing of the two ends.

Described herein is the development of recombinant fusion protein thatactively recruit linear DNA inserts in closer proximity to the genomiccleavage site, thereby allowing increasing integration efficiency,particularly of large DNA fragments, into the genome. Such improvementsto genomic editing technology allow one to use lower linear DNAconcentrations without sacrificing efficiency and can be furthercombined with other features, such as fluorescent protein reportingsystems.

SUMMARY OF THE INVENTION

Described herein is a composition including a vector encoding a fusionprotein including at least one endonuclease and a DNA binding moiety. Invarious embodiments, the fusion protein includes at least oneendonuclease selected from the group consisting of: cas regularlyinterspaced short palindromic (CRISPR) protein, a zinc finger nuclease(ZFNs) and transcription activator-like effector nucleases (TALENs). Invarious embodiments, the CRISPR protein includes cas9. In variousembodiments, the DNA binding moiety includes a zinc finger protein. Invarious embodiments, the zinc finger includes a left handed CCR5 bindingprotein. In various embodiments, the at least one endonuclease and DNAbinding moiety are joined by a linker including two, three, four, five,six, seven, eight, nine, ten or more amino acids. In variousembodiments, the fusion protein includes a fluorescent labeled protein.In various embodiments, the fluorescent labeled protein includes one ormore proteins selected from the group consisting of: green fluorescentprotein (GFP), enhanced (eGFP), red fluorescent protein (RFP) andmCherry. In various embodiments, the fusion protein includes a nuclearlocalization signal (NLS). In various embodiments, the NLS is SV40 NLS.

Further described herein is a method of genomic editing including (a)providing a quantity of one or more vectors encoding a fusion proteinincluding at least one endonuclease and a DNA binding moiety and (b)contacting a population of cells with the quantity of the one or morevectors, wherein the fusion protein is capable of inducing doublestranded break (DSB) and homologous recombination (HR) of the DSBresults in editing of the genome of the population of cells. In variousembodiments, the method includes contacting the population of cells withone or more guide RNAs (gRNAs) in step (b). In various embodiments, themethod includes contacting the population of cells with template DNA instep (b). In various embodiments, the template DNA includes at least oneexpression cassette, two flanking sequences, and a DNA binding moietysequence. In various embodiments, the two flanking sequences are each atleast 10 bp, and homologous to sequences in the genome of the populationof cells. In various embodiments, the DNA binding moiety sequenceincludes CCR5. In various embodiments, contacting the population ofcells includes a technique selected from the group consisting of:transfection, electroporation, and transformation. In variousembodiments, the population of cells includes stem cells or progenitorcells.

Described herein is a kit for genomic editing including one or morevectors encoding a fusion protein including at least one endonucleaseand a DNA binding moiety and template DNA including at least oneexpression cassette, two flanking sequences, and a DNA binding moietysequence. In various embodiments, the kit includes one or more guideRNAs (gRNAs). In various embodiments, the kit fusion protein includes atleast one endonuclease CRISPR protein, a DNA binding moiety that is azinc finger protein. In various embodiments, the kit includes afluorescent labeled protein. In various embodiments, the kit includes anuclear localization signal (NLS).

Also described herein is a method of multi-locus genomic editingincluding inducing sticky end formation at one or more loci by adding aCRISPR protein, providing a quantity of one or more guide strand RNAs,ligating one or more single-stranded donor DNA, hybridizing one or moredouble-stranded DNA with a terminating oligonucleotide, synthesis of oneor more double stranded DNA from the one or more single-stranded donorDNA to the one or more double-stranded DNA completing the donor DNAstrand to form a sticky end, and joining compatible sticky ends at oneor more loci.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 . Recombinant Cas-9 design. Conventionally, guide RNA (gRNA) iscomplementary to target DNA, and Cas-9 endonuclease cleaves the genomic,with stochastic processes determining the possible integration ofhomologous DNA. To shunt these processes towards favorable recombinationevents, a recombinant fusion protein including CCR5 zinc-finger as a DNAbinding moiety to bind the homologous linear DNA fragment to beinserted, and bringing the fragment closer to the double stranded break(DSB). A recombinant Cas-9 that aids in increasing proximity of DSB andlinear DNA will lead to more efficient generation of genome editing.

FIG. 2 . Comparison between wildtype Cas-9 and recombinant Cas-9 usingGFP as reporter and beta-actin as the target gene. Wild-type conditionincluded 5 out of 18 CFP positive colonies that were also GFP positive,2 of which were only GFP positive (i.e., stable integrants). Under Cas-9zinc finger conditions included 9 out of 12 CFP positive colonies thatwere also GFP positive, 4 of which were only GFP positive. Scale bars=10μM.

FIG. 3 : (FIG. 3A) Data showing that the addition of the Zinc-finger toCas9 does not confer additional zinc-finger dependent endonucleaseactivity. Donor DNA (900 bp) contains the CCR5 DNA binding sequence(base pairs 80-92 on the 5′ end of the linear DNA). Combinations ofprotein and guide strand are labeled for each condition. (FIG. 3B) Threeexamples of GFP signal due to integration of donor DNA. The GFP positiveclones are being validated. mCherry is the internal viability control.

FIG. 4 : Integration of GFP in beta-actin validated by PCR. PCR of GFPis a 300 bp fragment of the total GFP sequence, beta-actin reverseprimer anneals 300 bp downstream of the sfGFP integration site.

FIGS. 5A and 5B: Cas9-zinc finger fusion protein linearizes plasmid DNAefficiently. It appears there off-target cleavage that occurs perhapsdue to the zinc finger, although off-target cleavage is ameliorated whenthe donor DNA is included. Lanes 5 and 6 are one sample loaded into twowells because well 5 was partially occluded.

FIG. 6 : Parallel multi-locus directed by guide-strand RNA mixture.Begins with T7 RNA synthesis of guide-strand RNA (FIG. 6A), followed bysplint (FIG. 6B) mediated ligation (FIG. 6C) of single-stranded donorDNA sequence (FIG. 6D) onto the 5′ end of the guide-strand RNA. The 5′nucleotide of the synthesized donor DNA (FIG. 6E) has an exonucleaseresistant phosphorothioate bond with its neighboring nucleotide.Thereafter, hybridization of double-stranded DNA terminatingoligonucleotide (FIG. 6F) is followed by an isothermal DNA polymerase(Klenow exo-) reaction to fill-in the double strandedness (FIG. 6G) fromthe splinting primer (step 2) to the terminating oligo (step 3) andT4-DNA ligase reaction (FIG. 6H) to complete the donor fragment with theappropriate 5′ sticky end (FIG. 6I) for donor DNA sticky end ligation tothe sticky end of the genomic DNA (FIG. 6J) digested by cpf1. Completeintegration happens upon ligation of the sticky end (FIG. 6K) of thedonor DNA and homologous recombination between donor and genomic DNA(FIG. 6L) due to their proximity and sequence homology.

DETAILED DESCRIPTION OF THE INVENTION

All references cited herein are incorporated by reference in theirentirety as though fully set forth. Unless defined otherwise, technicaland scientific terms used herein have the same meaning as commonlyunderstood by one of ordinary skill in the art to which this inventionbelongs. Allen et al., Remington: The Science and Practice of Pharmacy22^(nd) ed., Pharmaceutical Press (Sep. 15, 2012); Hornyak et al.,Introduction to Nanoscience and Nanotechnology, CRC Press (2008);Singleton and Sainsbury, Dictionary of Microbiology and MolecularBiology 3^(rd) ed., revised ed., J. Wiley & Sons (New York, N.Y. 2006);Smith, March's Advanced Organic Chemistry Reactions, Mechanisms andStructure 7^(th) ed., J. Wiley & Sons (New York, N.Y. 2013); Singleton,Dictionary of DNA and Genome Technology 3^(rd) ed., Wiley-Blackwell(Nov. 28, 2012); and Green and Sambrook, Molecular Cloning: A LaboratoryManual 4^(th) ed., Cold Spring Harbor Laboratory Press (Cold SpringHarbor, N.Y. 2012), provide one skilled in the art with a general guideto many of the terms used in the present application. For references onhow to prepare antibodies, see Greenfield, Antibodies A LaboratoryManual 2^(nd) ed., Cold Spring Harbor Press (Cold Spring Harbor N.Y.,2013); Köhler and Milstein, Derivation of specific antibody-producingtissue culture and tumor lines by cell fusion, Eur. J. Immunol. 1976July, 6(7):511-9; Queen and Selick, Humanized immunoglobulins, U.S. Pat.No. 5,585,089 (1996 December); and Riechmann et al., Reshaping humanantibodies for therapy, Nature 1988 Mar. 24, 332(6162):323-7.

One skilled in the art will recognize many methods and materials similaror equivalent to those described herein, which could be used in thepractice of the present invention. Indeed, the present invention is inno way limited to the methods described herein. For purposes of thepresent invention, the following terms are defined below.

“Administering” and/or “administer” as used herein refer to any routefor delivering a pharmaceutical composition to a patient. Routes ofdelivery may include non-invasive peroral (through the mouth), topical(skin), transmucosal (nasal, buccal/sublingual, vaginal, ocular andrectal) and inhalation routes, as well as parenteral routes, and othermethods known in the art. Parenteral refers to a route of delivery thatis generally associated with injection, including intraorbital,infusion, intraarterial, intracarotid, intracapsular, intracardiac,intradermal, intramuscular, intraperitoneal, intrapulmonary,intraspinal, intrasternal, intrathecal, intrauterine, intravenous,subarachnoid, subcapsular, subcutaneous, transmucosal, or transtracheal.Via the parenteral route, the compositions may be in the form ofsolutions or suspensions for infusion or for injection, or aslyophilized powders.

“Modulation” or “modulates” or “modulating” as used herein refers toupregulation (i.e., activation or stimulation), down regulation (i.e.,inhibition or suppression) of a response or the two in combination orapart.

“Pharmaceutically acceptable carriers” as used herein refer toconventional pharmaceutically acceptable carriers useful in thisinvention.

“Promote” and/or “promoting” as used herein refer to an augmentation ina particular behavior of a cell or organism.

“Subject” as used herein includes all animals, including mammals andother animals, including, but not limited to, companion animals, farmanimals and zoo animals. The term “animal” can include any livingmulti-cellular vertebrate organisms, a category that includes, forexample, a mammal, a bird, a simian, a dog, a cat, a horse, a cow, arodent, and the like. Likewise, the term “mammal” includes both humanand non-human mammals.

“Therapeutically effective amount” as used herein refers to the quantityof a specified composition, or active agent in the composition,sufficient to achieve a desired effect in a subject being treated. Atherapeutically effective amount may vary depending upon a variety offactors, including but not limited to the physiological condition of thesubject (including age, sex, disease type and stage, general physicalcondition, responsiveness to a given dosage, desired clinical effect)and the route of administration. One skilled in the clinical andpharmacological arts will be able to determine a therapeuticallyeffective amount through routine experimentation.

“Treat,” “treating” and “treatment” as used herein refer to boththerapeutic treatment and prophylactic or preventative measures, whereinthe object is to prevent or slow down (lessen) the targeted condition,disease or disorder (collectively “ailment”) even if the treatment isultimately unsuccessful. Those in need of treatment may include thosealready with the ailment as well as those prone to have the ailment orthose in whom the ailment is to be prevented.

Genome editing has been a hot area of research for the past decade withan emphasis on generating genetic knockouts for genes of interest withinmodel animals. Most methods for knocking in donated genetic sequenceshave typically relied on either brute force methods with random sites ofintegration or on homologous recombination events. Some of thesetechniques only work in defined model organisms, while others have onlylimited application. Therefore, a more wide-ranging and efficienttoolbox for the integration of donor DNA with sequences ranging from300-3,500 bp in length would present solutions for answering certainquestions within biology, more rapid generation of useful transgenicanimals, and making theranostics cheaper, faster, and more accessible toclinicians and their patients. This toolbox could make even a small labmore efficient at making “footprint-free” transgenic cell lines/animalsusing industry standard cell culture techniques, quickly, and with fewerattempts at a reasonable overhead cost.

A fundamental concern regarding donor DNA integration, on top of theproblems addressed previously about mutations pen-integration, is theinversely proportional relationship between the length of the donor DNAfragment and the efficiency of integration. It has been reported that 27and 54 bases of inserted donor DNA sequence with 41 and 49 bases ofsequence homology flanking those 27 bases (117 total length) or 33 basepairs on either side of 54 base pairs (120 bp total) result in thefollowing efficiencies: 64% mutation rate, 3.2% integration rate, withhalf of those lacking other mutations for the 117 base donor DNA; and86% mutation rate, 15.6% integration, with 3.5% precise integration.Other have used TALEN endonucleases combined with very creativemathematics to report a mutation rate of 0% and a 50% integration rate(but using worst case scenario could be as low as 8.9%) and 1.5%germline transmission of 700 bases encoding GFP flanked by 827 bases and904 bases of homology to the integration site.

Second, integration efficiency of donor DNA into the genome is directlyproportional to the donor DNA concentration in the cell. The moreabundant a fragment is in the cell, the more likely it is to participatein the DNA repair mechanism. A problem arises from overabundance oflinear DNA fragments causing cytotoxicity due in large part to theinnate ability of cells to defend themselves against DNA viruses, andalso to saturating the cell's ability to recover from the endonucleasedamage. The inability of a cell to discriminate between damaged DNA endsand the donor DNA ends could leave the genomic DNA unrepaired.

Third, the nucleus of a cell is a structure densely packed with genomicDNA. The two free ends of a double stranded break in the genomic DNA arespatially restricted, cannot diffuse away from each other, and thereforethe most likely outcome is re-annealing of the two ends with somesequence added and/or removed at the locus of endonuclease activity.

Genomic Editing. What has been shown up to this point is that genomeengineering is versatile and powerful tool to correct genetic mutations.Site-specific chromosomal integration can target desired nucleotidechanges, including introducing therapeutic gene cassettes in safelanding sites within chromosomes, disrupting the coding or non-codingregions of specific alleles and correcting the genetic mutations toreverse the disease phenotype. Conventional technologies such as ZincFinger Nuclease (ZFNs) and Transcription Activator-Like EffectorNucleases (TALENs) have provided a significant groundwork ofproof-of-concept studies for genome editing and therapy. Yet, the mostrecent advances in Clustered Regularly Interspaced Short PalindromicRepeats (CRISPR) and CRISPR-associated endonuclease protein (cas) systemextend this versatility and convenience by reducing the number of stepsrequired for designing targeting of a particular mutation.

Briefly, genome editing using tools such as ZFNs can be based on theintroduction of a site-specific DNA double stranded break (DSB) into thelocus of interest. Key to this process is the cellular repair mechanismfor efficiently repairing DSBs via the homology-directed repair (HDR),or non-homologous end joining (NHEJ) pathways. The mechanisms of theseDNA repair pathways can generate defined genetic outcomes. Morespecifically, genome editing using ZFNs can be based on the introductionof a site-specific DNA DSB into the locus of interest. Thereafter, NHEJrepair, can rapidly and efficiently ligate two broken ends, providingopportunity for the gain or loss of genetic information. This featurecan be exploited to introduce small insertions and/or deletions at thesite of the break, thereby allowing disruption of a target gene. If, forexample, a disease results from toxic protein buildup, instruction of anonsense or missense sequence effectively eliminates aberrant protein tocorrect human disorders caused by inherited gene defects. Alternatively,if a specifically-designed homologous donor DNA is provided incombination with the ZFNs, this template can result in gene correctionor insertion, as repair of the DSB can include a few nucleotides changedat the endogenous site or the addition of a new gene at the break site.

While pioneering much of what is known about genomic editing process,significant challenges exist with conventional technologies such asZFNs, and TALENs. These early generation nucleases, ZFNs and TALENs areartificial fusion proteins composed of an engineered DNA binding domainfused to a non-specific nuclease cleavage domain from the Fok1restriction enzyme. Zinc finger and transcription activator-likeeffector repeat domains with customized specificities can be joined tobind to extended DNA sequences. While adaptation of ZFNs and TALENs bymodifying the DNA-binding specificities provide a significant level oftargeting control, individual zinc finger domains provide someheterogeneity requiring some context-dependence for DNA binding. TALErepeat domains appear less susceptible to these context-dependenteffects and can be modularly assembled to recognize virtually any DNAsequence via a simple one-to-one code between individual repeats and thefour possible DNA nucleotides, but assembly of DNAs encoding largenumbers of highly conserved TALE repeats can require the use ofnon-standard molecular biology cloning methods.

Whereas both ZFNs and TALENs involving use protein-DNA interactions fortargeting, bacterial CRISPR-Cas system is unique and flexible due toutilization of RNA as the moiety that targets the nuclease to a desiredDNA sequence. In contrast to ZFN and TALEN platforms, CRISPR-CAS usessimple Watson-Crick base pairing rules between an engineered RNA and thetarget DNA site. Generally, two components form the core of a CRISPRnuclease system, a Cas nuclease (e.g., cas9) and a guide RNA (gRNA), thegRNA derived from a fusion of CRISPR-derived RNAs (“crRNA”) andtrans-acting antisense RNA (“tracRNA”). In the most well-studiedexample, the single gRNA complexes with a cas protein (e.g., cas9) tomediate cleavage of target DNA sites that are complementary to the first(5′) 20 nts of the gRNA and that lie next to a protospacer adjacentmotif (“PAM”) sequence (canonical form of 5′-NGG for Streptococcuspyogenes cas9, but also alternate 5′-NAG exist). Thus, with this system,Cas9 nuclease activity can be directed to any DNA sequence of the formN20-NGG simply by altering the first 20 nts of the gRNA to correspond tothe target DNA sequence. It is notes that Type II CRISPR systems fromother species of bacteria recognize alternative PAM sequences and thatutilize different crRNA and tracrRNA sequences could also be used toperform targeted genome editing.

The Cas9-induced DSBs have been used to introduce NHEJ-mediated indelmutations as well as to stimulate HDR with both double-stranded plasmidDNA and single-stranded oligonucleotide donor templates. The capabilityto introduce DSBs at multiple sites in parallel using the Cas9 system isa unique advantage of this platform relative to ZFNs, or TALENs. Forexample, expression of Cas9 and multiple gRNAs has been used to inducesmall and large deletions or inversions between the DSBs, tosimultaneously introduce parallel genetic editing mutations alteringdifferent genes in rats, mouse ES cell clones, and zebrafish. Together,these advances in CRISPR/cas-mediated gene editing technology canaccelerated the pace of gene-function relationship discovery, and afocused approach for developing personalized therapeutics.

An alternative is the use of Cpf1 a putative class 2 CRISPR effectorwith features distinct from Cas9. Cpf1 is a single RNA-guidedendonuclease that does not require use of tracrRNA. Instead, it utilizesa T-rich protospacer-adjacent motif, TTTN, and on the 5′ side of theguide. As a result, the cut CPF1 makes is staggered, occurring 19 bpafter the PAM on the targeted+strand, and 23 bp on the opposite strand.Cpf1 generates a staggered cut with a 5′ overhang, in contrast to theblunt ends generated by Cas9. This allows for editing throughnon-homologous end joining (NHEJ). Being able to program the exactsequence of a sticky end would allow researchers to design the DNAinsert so that it integrates in the proper orientation. As_Cpf1 (fromAcidaminococcus) and LbCpf1 (from Lachnospiraceae) have beendemonstrated as possessing genomic editing capacity in human cells.

The elegance of the CRISPR/Cas or CRISPR/Cpf1 system is allowing fortailoring to target the patient's particular mutation, combined with adelivery system via adeno-associated virus (“AAV”), or also viaadenovirus, vectors as optimal vehicles for genome editing machinery candeliver components directly to the organ or cells of interest. Therehave been limited reports that, for example, systemic injection of anAAV vector carrying a zinc-finger nuclease and donor template constructwas able to correct mutant transgenic clotting Factor IX in mice andreconstitute low but clinically detectable levels of circulatingprotein. In this regard, an AAV-CRISPR system could be delivered totreat dominant mutations via the same gene correction mechanism used forrecessive mutations, compactly deliver targeted genomic editingmachinery with a limited footprint capable of being delivered via viralvectors, as constant and agnostic to the size of the target gene andmaintain the endogenous gene expression stoichiometry. These and otheradvantages of CRISPR/Cas, CRISPR/Cpf1 editing give it a wide range ofpossible clinical applications.

Thus, for any genome editing system with engineered nucleases, the useof an endonuclease joined to a DNA binding moiety such as zinc fingerprotein that binds donor DNA, or other methods to associate a donor DNA.The proximity of donor DNA and the double stand break leads to increasedintegration efficiency of the donor DNA.

Despite these advances, a paramount technical problem remains the lowintegration efficiency of donor DNA using genome editing tools. Herein,the Inventors describe a solution based on the realization that it ispossible to bring the donor DNA in close proximity to the site of thedouble stranded break in the genomic DNA. This can be accomplishedthrough creation of a Cas9:DNA-binding protein motif fusion. Byrecruiting the donor DNA to the cleavage site, it is possible to use adonor DNA concentration that is non-toxic to cells and yet providing alocal donor DNA concentration that should improve integrationefficiency. Because the donor DNA is held in such close proximity to thedamaged DNA, one can engineer the sequence homology arms on both ends ofthe donor DNA to be relatively short (<1004) such that the cell's SOSDNA repair system favors the: 5′-3′ exonuclease V trimming, RPA coatingof the single strand DNA, 3′ strand invasion, 3′ end trimming byexonucleases Rad1/Rad10, and DNA nick ligation to repair the DNA versuspure Holliday junction dependent homologous recombination. Exploitingthis type of DNA repair, if a favored result, will have a robust genomeediting system which can be further optimized for integration of verylarge donor DNA fragments. By shunting stochastic processes towardsfavorable recombination events, recombinant fusion proteins aid inincreasing proximity of DSB and linear DNA for more efficient genomeediting.

In order for this increased integration efficiency to take place, thedonor DNA must be in a position that is as close, or closer, to the siteof DNA damage than the two opposing ends of the damaged genomic DNA areto each other. Recruitment of the donor DNA to these sites is performedeither by direct or indirect association of the donor DNA with theenzymatically active Cas-9 RNA-dependent endonuclease. These DNA bindingelements entail specific domains or full-length proteins in theirentirety, including but not limited to these naturally occurring orengineered examples: transcription factors, endonucleases, zinc fingers,TALENs, endonuclease-minus Cas-9+guide strand RNA, or other suchribonucleoprotein that can bind directly or indirectly to specific DNAsequences. Some, but certainly not all, of the possible configurationsof Cas9-endonuclease and donor DNA binding elements are: direct fusion,association (multimerization domains like leucine zippers, fkbp/FRB,etc.), engineered association via antibody mimetics, or any syntheticmacromolecule (carbohydrate-, protein-, or lipid-based) which bind toCas9 endonuclease and also bind to the donor DNA in a sequence specificmanner. Preliminary results suggest this modified Cas-9 works muchbetter than wild-type Cas-9 in generating knock-in gene modifications incell lines.

Described herein is a composition including a vector encoding a fusionprotein including at least one endonuclease and a DNA binding moiety. Invarious embodiments, the fusion protein endonuclease includes at leastone endonuclease selected from the group consisting of: cas regularlyinterspaced short palindromic (CRISPR) protein, a zinc finger nuclease(ZFNs) or transcription activator-like effector nucleases (TALENs). Inother embodiments, the DNA binding moiety includes a zinc fingerprotein. In other embodiments, the fusion protein includes at least oneendonuclease CRISPR protein and a DNA binding moiety zinc fingerprotein. In other embodiments, the zinc finger protein includes a lefthanded, right handed, or both zinc fingers. In other embodiments, thezinc finger includes a left handed CCR5 sequence. In other embodiments,DNA binding moieties can include specific domains or full-lengthproteins in their entirety, including transcription factors,endonucleases, zinc fingers, TALENs, endonuclease-minus Cas-9+guidestrand RNA, or other such ribonucleoprotein that can bind directly orindirectly to specific DNA sequences. In other embodiments, the at leastone endonuclease and DNA binding moiety are joined by a linker includingtwo, three, four, five, six, seven, eight, nine, ten or more aminoacids. In other embodiments, some configurations of Cas9-endonucleaseand donor DNA binding moieties are: direct fusion, association(multimerization domains like leucine zippers, fkbp/FRB, etc.),engineered association via antibody mimetics, or any syntheticmacromolecule (carbohydrate-, protein-, or lipid-based) which bind toCas9 endonuclease and also bind to the donor DNA in a sequence specificmanner. In other embodiments, the fusion protein further includes anuclear localization signal (NLS), such as SV40 NLS. In otherembodiments, the CRISPR protein is a Streptococcus pyogenes-derived casprotein. In other embodiments, the CRISPR protein is not a Streptococcuspyogenes-derived cas protein. In various embodiments, CRISPR protein iscpf1, such as AsCpf1 from Acidaminococcus and LbCpf1 is fromLachnospiraceae. In other embodiments, the CRISPR protein is cas9. Inother embodiments, the CRISPR protein is cpf1. In other embodiments, thefusion protein includes a reporter protein. In various embodiments, thereport protein includes a fluorescent labeled protein including green orred fluorescent protein (GFP or RFP, including enhanced eGFP), mCherry,or similar proteins. In other embodiments, the vector is a DNA vector,plasmid, artificial chromosome. In other embodiments, the vector is avirus, such as adenovirus, adeno associated virus, or lentivirus.

In other embodiments, the vector encodes one or more guide RNAs (gRNAs),wherein the one or more gRNAs include a sequence capable of binding to aprotospacer adjacent motif (PAM). In other embodiments, the one or moregRNAs include a sequence capable of binding to a PAM. In otherembodiments, the PAM includes the sequence NGG. In other embodiments,the PAM includes the sequence NAG. In other embodiments, the gRNAcomprise a CRISPR-derived RNAs (crRNA) and trans-acting antisense RNA(tracRNA). In various embodiments, the gRNA is 10, 20, 30, or 40 or morenucleotides in length. In various embodiments, about 15, 16, 17, 18, 19,20, 21, 22, 23, 24, 25 or more nucleotides are cognate to a gene ofinterest. In various embodiments, about 20 nucleotides are cognate to agenetic loci of interest. For example, this includes gRNA designs thathybridize to a target sequence with N₂₀NGG. In some embodiments, theCRISPR protein is cas9. In other embodiments, the CRISPR protein iscpf1. In various embodiments, the composition is used in a method foraltering a target polynucleotide sequence in a cell including contactingthe polynucleotide sequence with a CRISPR protein (e.g., cas9) with atleast one gRNA directing CRISPR to hybridize to a cognate sequence on atarget polynucleotide sequence, wherein the target polynucleotidesequence is cleaved, and wherein the efficiency of alteration of cellsthat express CRISPR protein is from about 10-20%, 30-40%, 40-50%, or50-80% or more. In various embodiments, the efficiency of alteration isimproved 1×, 2, 3×, 4×, 5×, 6×, 7×, 8×, 9×, 10×, 20×, 30×, 40×, 50×,100× when compared to a method using a native wild-type endonuclease.Further described herein is a quantity of cells produced using thedescribed method.

Further described is a method of genomic editing including providing aquantity of one or more vectors each encoding a fusion protein includingat least one endonuclease and a DNA binding moiety and contacting apopulation of cells with the quantity of the one or more vectors. Invarious embodiments, the fusion protein endonuclease includes at leastone endonuclease selected from the group consisting of: cas regularlyinterspaced short palindromic (CRISPR) protein, a zinc finger nuclease(ZFNs) or transcription activator-like effector nucleases (TALENs). Inother embodiments, the DNA binding moiety includes a zinc fingerprotein. In other embodiments, the fusion protein includes at least oneendonuclease CRISPR protein and a DNA binding moiety zinc fingerprotein. In other embodiments, the zinc finger protein includes a lefthanded, right handed, or both zinc fingers. In other embodiments, thezinc finger includes a left handed CCR5 sequence. In other embodiments,DNA binding moieties can include specific domains or full-lengthproteins in their entirety, including transcription factors,endonucleases, zinc fingers, TALENs, endonuclease-minus Cas-9+guidestrand RNA, or other such ribonucleoprotein that can bind directly orindirectly to specific DNA sequences. In other embodiments, the at leastone endonuclease and DNA binding moiety are joined by a linker includingtwo, three, four, five, six, seven, eight, nine, ten or more aminoacids. In other embodiments, Some, configurations of Cas9-endonucleaseand donor DNA binding moieties are: direct fusion, association(multimerization domains like leucine zippers, fkbp/FRB, etc.),engineered association via antibody mimetics, or any syntheticmacromolecule (carbohydrate-, protein-, or lipid-based) which bind toCas9 endonuclease and also bind to the donor DNA in a sequence specificmanner. In other embodiments, the fusion protein further includes anuclear localization signal (NLS), such as SV40 NLS. In otherembodiments, the CRISPR protein is a Streptococcus pyogenes-derived casprotein. In other embodiments, the CRISPR protein is not a Streptococcuspyogenes-derived cas protein. In various embodiments, CRISPR proteincpf1, such as AsCpf1 from Acidaminococcus and LbCpf1 is fromLachnospiraceae. In other embodiments, the CRISPR protein is cas9. Inother embodiments, the CRISPR protein is cpf1. In other embodiments, thefusion protein includes a reporter protein. In various embodiments, thereport protein includes a fluorescent labeled protein including green orred fluorescent protein (GFP or RFP, including enhanced eGFP), mCherry,or similar proteins. In other embodiments, the method is an in vivomethod. In other embodiments, the method is an in vitro method. Incertain embodiments, the population of cells include embryonic stemcells, including human or mouse embryonic stem cells. In variousembodiments, the method includes generation of a double stranded break(DSB) in the quantity of cells, wherein homologous recombination (HR) ofthe DSB results in editing of the genome of the cells. In otherembodiments, HR includes non-homologous end joining (NHEJ) introducingmissense or nonsense of a protein expressed at the locus. In otherembodiments, the missense or nonsense results in a knockout of a targetsequence in the genome. In other embodiments, HR includes homologydirected repair (HDR) introduces template DNA. In other embodiments, theHDR results in a knock-in of a target sequence in the genome. In otherembodiments, the template DNA is cognate to a target sequence. In otherembodiments, the template DNA is cognate to a wild-type geneticsequence. In other embodiments, the template DNA contains an expressioncassette, for example, including a sequence transcribed and translatedinto a protein of interest. In other embodiments, the template DNAincludes at least 80 bases of exact sequence homology both upstream anddownstream of about 20 bases cognate to a target sequence, or cognate toa wild-type genetic sequence. In other embodiments, the upstream anddownstream sequences are at least 10, 20, 30, 40, 50, 60, 70, 80, 90,100, 110, 120, 130, 140, 150, 200, 250, 300, 350, 400, 450, 500 or morebase pairs. In other embodiments, the template DNA includes a sequencefor binding of a DNA binding moiety. In other embodiments, the templateDNA includes about 12 base pairs constituting the left-handed CCR5-zincfinger-binding site. In other embodiments, contacting a population ofcells with the quantity of the one or more vectors include transfection,electroporation, and/or lipofection. In other embodiments, the vector isa DNA vector, plasmid, artificial chromosome. In other embodiments, thevector is a virus, such as adenovirus, adeno associated virus, orlentivirus.

In other embodiments, the vector encodes one or more guide RNAs (gRNAs),wherein the one or more gRNAs include a sequence capable of binding to aprotospacer adjacent motif (PAM). In other embodiments, one or moreexogenous gRNAs are introduced to the quantity of cells. In otherembodiments, the one or more gRNAs include a sequence capable of bindingto a PAM. In other embodiments, the PAM includes the sequence NGG. Inother embodiments, the PAM includes the sequence NAG. In otherembodiments, the gRNA comprise a CRISPR-derived RNAs (crRNA) andtrans-acting antisense RNA (tracRNA). In various embodiments, the gRNAis 10, 20, 30, or 40 or more nucleotides in length. In variousembodiments, about 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or morenucleotides are cognate to a gene of interest. In various embodiments,about 20 nucleotides are cognate to a genetic loci of interest. Forexample, this includes gRNA designs that hybridize to a target sequencewith N₂₀NGG. In some embodiments, the CRISPR protein is cas9. In variousembodiments, the composition is used in a method for altering a targetpolynucleotide sequence in a cell including contacting thepolynucleotide sequence with a CRISPR protein (e.g., cas9) with at leastone gRNA directing CRISPR to hybridize to a cognate sequence on a targetpolynucleotide sequence, wherein the target polynucleotide sequence iscleaved, and wherein the efficiency of alteration of cells that expressCRISPR protein is from about 10-20%, 30-40%, 40-50%, or 50-80% or more.In various embodiments, the efficiency of alteration is improved 1×, 2,3×, 4×, 5×, 6×, 7×, 8×, 9×, 10×, 20×, 30×, 40×, 50×, 100× when comparedto a method using a native wild-type endonuclease. Further describedherein is a quantity of cells produced using the described method.

For example, the method of genomic editing including providing aquantity of one or more vectors each encoding a fusion protein includingat least one endonuclease and a DNA binding moiety and contacting apopulation of cells with the quantity of the one or more vectors,wherein the at least one endonuclease includes a CRISPR protein and aDNA binding moiety includes zinc finger protein, contacting a populationof cells with the quantity of the one or more vectors includestransfection, electroporation, and/or lipofection, further includinggRNA and/or template DNA, and after contacting a population of cells,results in generation of a double stranded break (DSB) and homologydirected repair (HDR) introduces the template DNA in the genome of thepopulation of cells. In other embodiments, the template DNA is cognateto a target sequence. In other embodiments, the template DNA is cognateto a wild-type genetic sequence. In other embodiments, the template DNAcontains an expression cassette, for example, including a sequencetranscribed and translated into a protein of interest. In otherembodiments, the template DNA includes at least 80 bases of exactsequence homology both upstream and downstream of about 20 bases cognateto a target sequence, or cognate to a wild-type genetic sequence. Inother embodiments, the template DNA includes a sequence for binding of aDNA binding moiety. In other embodiments, the template DNA includesabout 12 base pairs constituting the left-handed CCR5-zincfinger-binding site.

Also described is a kit for genomic editing including a quantity of oneor more vectors each encoding a fusion protein including at least oneendonuclease and a DNA binding moiety and contacting a population ofcells with the quantity of the one or more vectors, and furtherincluding a template DNA cognate to a target sequence or a wild-typegenetic sequence, the template DNA including at least 10 bases of exactsequence homology both upstream and downstream of about 20 bases cognateto a target sequence, or cognate to a wild-type genetic sequence. Inother embodiments, the DNA binding moiety includes a zinc fingerprotein. In other embodiments, the fusion protein includes at least oneendonuclease CRISPR protein and a DNA binding moiety zinc fingerprotein. In other embodiments, the kit includes one or more guide RNAs(gRNAs), wherein the one or more gRNAs include a sequence capable ofbinding to a protospacer adjacent motif (PAM).

In other embodiments, the upstream and downstream sequences are at least10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 200,250, 300, 350, 400, 450, 500 or more base pairs. In various embodiments,the fusion protein endonuclease includes at least one endonucleaseselected from the group consisting of: cas regularly interspaced shortpalindromic (CRISPR) protein, a zinc finger nuclease (ZFNs) ortranscription activator-like effector nucleases (TALENs). In otherembodiments, the DNA binding moiety includes a zinc finger protein. Inother embodiments, the fusion protein includes at least one endonucleaseCRISPR protein and a DNA binding moiety zinc finger protein. In otherembodiments, the zinc finger protein includes a left handed, righthanded, or both zinc fingers. In other embodiments, the zinc fingerincludes a left handed CCR5 sequence. In other embodiments, DNA bindingmoieties can include specific domains or full-length proteins in theirentirety, including transcription factors, endonucleases, zinc fingers,TALENs, endonuclease-minus Cas-9+guide strand RNA, or other suchribonucleoprotein that can bind directly or indirectly to specific DNAsequences. In other embodiments, the at least one endonuclease and DNAbinding moiety are joined by a linker including two, three, four, five,six, seven, eight, nine, ten or more amino acids. In other embodiments,some, configurations of Cas9-endonuclease and donor DNA binding moietiesare: direct fusion, association (multimerization domains like leucinezippers, fkbp/FRB, etc.), engineered association via antibody mimetics,or any synthetic macromolecule (carbohydrate-, protein-, or lipid-based)which bind to Cas9 endonuclease and also bind to the donor DNA in asequence specific manner. In other embodiments, the fusion proteinfurther includes a nuclear localization signal (NLS), such as SV40 NLS.In other embodiments, the CRISPR protein is a Streptococcuspyogenes-derived cas protein. In other embodiments, the CRISPR proteinis not a Streptococcus pyogenes-derived cas protein. In variousembodiments, CRISPR protein is cpf1, such as AsCpf1 from Acidaminococcusand LbCpf1 is from Lachnospiraceae. In other embodiments, the CRISPRprotein is cas9. In other embodiments, the fusion protein includes areporter protein. In various embodiments, the report protein includes afluorescent labeled protein including green or red fluorescent protein(GFP or RFP, including enhanced eGFP), mCherry, or similar proteins. Invarious embodiments, the kit is capable of generating a double strandedbreak (DSB) in the quantity of cells, wherein homologous recombination(HR) of the DSB results in editing of the genome of the cells. In otherembodiments, HR includes non-homologous end joining (NHEJ) introducingmissense or nonsense of a protein expressed at the locus. In otherembodiments, the missense or nonsense results in a knockout of a targetsequence in the genome. In other embodiments, HR includes homologydirected repair (HDR) introduces template DNA. In other embodiments, theHDR results in a knock-in of a target sequence in the genome. In otherembodiments, the template DNA is cognate to a target sequence. In otherembodiments, the template DNA is cognate to a wild-type geneticsequence. In other embodiments, the template DNA contains an expressioncassette, for example, including a sequence transcribed and translatedinto a protein of interest. In other embodiments, the template DNAincludes at least 80 bases of exact sequence homology both upstream anddownstream of about 20 bases cognate to a target sequence, or cognate toa wild-type genetic sequence. In other embodiments, the upstream anddownstream sequences are at least 10, 20, 30, 40, 50, 60, 70, 80, 90,100, 110, 120, 130, 140, 150, 200, 250, 300, 350, 400, 450, 500 or morebase pairs. In other embodiments, the template DNA includes a sequencefor binding of a DNA binding moiety. In other embodiments, the templateDNA includes about 12 base pairs constituting the left-handed CCR5-zincfinger-binding site. In other embodiments, contacting a population ofcells with the quantity of the one or more vectors include transfection,electroporation, and/or lipofection. In other embodiments, the vector isa DNA vector, plasmid, artificial chromosome. In other embodiments, thevector is a virus, such as adenovirus, adeno associated virus, orlentivirus.

In other embodiments, the vector encodes one or more gRNAs, wherein theone or more gRNAs include a sequence capable of binding to a PAM. Inother embodiments, one or more exogenous gRNAs are introduced to thequantity of cells. In other embodiments, the one or more gRNAs include asequence capable of binding to a PAM. In other embodiments, the PAMincludes the sequence NGG. In other embodiments, the PAM includes thesequence NAG. In other embodiments, the gRNA comprise a CRISPR-derivedRNAs (crRNA) and trans-acting antisense RNA (tracRNA). In variousembodiments, the gRNA is 10, 20, 30, or 40 or more nucleotides inlength. In various embodiments, about 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25 or more nucleotides are cognate to a gene of interest. Invarious embodiments, about 20 nucleotides are cognate to a genetic lociof interest. For example, this includes gRNA designs that hybridize to atarget sequence with N₂₀NGG. In some embodiments, the CRISPR protein iscas9. In other embodiments, the CRISPR protein is cpf1. In variousembodiments, the kit is used in a method for altering a targetpolynucleotide sequence in a cell including contacting thepolynucleotide sequence with a CRISPR protein (e.g., cas9) with at leastone gRNA directing CRISPR to hybridize to a cognate sequence on a targetpolynucleotide sequence, wherein the target polynucleotide sequence iscleaved, and wherein the efficiency of alteration of cells that expressCRISPR protein is from about 10-20%, 30-40%, 40-50%, or 50-80% or more.In various embodiments, the efficiency of alteration is improved 1×, 2,3×, 4×, 5×, 6×, 7×, 8×, 9×, 10×, 20×, 30×, 40×, 50×, 100× when comparedto a method using a native wild-type endonuclease. Further describedherein is a quantity of cells produced using the described method.

Also described herein is a method of multi-locus genomic editingincluding inducing sticky end formation at one or more loci by adding aCRISPR protein, providing a quantity of one or more guide strand RNAs,ligating one or more single-stranded donor DNA, hybridizing one or moredouble-stranded DNA with a terminating oligonucleotide, synthesis of oneor more double stranded DNA from the one or more single-stranded donorDNA to the one or more double-stranded DNA completing the donor DNAstrand to form a sticky end, and joining compatible sticky ends at oneor more loci. In various embodiments, providing a quantity of one ormore guide strand RNAs includes T7 RNA synthesis of guide-strand RNA. Inother embodiments, ligating one or more single-stranded donor DNAincludes splint mediated ligation of single-stranded donor DNA sequenceonto the 5′ end of the guide-strand RNA. In various embodiments, the 5′nucleotide of the synthesized donor DNA has an exonuclease resistantphosphorothioate bond with its neighboring nucleotide. In otherembodiments, hybridization of double-stranded DNA terminatingoligonucleotide occurs. In other embodiments, synthesis of one or moredouble stranded DNA from the one or more single-stranded donor DNA tothe one or more double-stranded DNA completing the donor DNA strandincludes isothermal DNA polymerase (Klenow exo-) reaction. In otherembodiments, synthesis of one or more double stranded DNA from the oneor more single-stranded donor DNA to the one or more double-stranded DNAcompleting the donor DNA strand to form a sticky end is by T4-DNA ligasereaction. In various embodiments, complete integration happens uponligation of the sticky end of the donor DNA and homologous recombinationbetween donor and genomic DNA due to their proximity and sequencehomology. In various embodiments, the PAM is TTTN. In variousembodiments, the cpf1 sticky end is 19 bp after the PAM on thetargeted+strand, and 23 bp on the opposite strand with a 5′ overhang. Invarious embodiments, cpf1 such as AsCpf1 from Acidaminococcus and LbCpf1is from Lachnospiraceae.

Example 1 Experimental Design

As described, gene manipulation by Cas-9 includes genetic knockoutsthrough Cas-9 DNA cleavage and emergency DNA repair systems arerelatively easy to produce, yet gene knock-ins/fusions are much morechallenging because: of larger DNA inserts (GFP=717 base pairs)integrate with low efficiency, dependency of integration rate isdependent on insert DNA concentration, higher concentrations of linearDNA is toxic to cells.

Through actively recruiting the linear DNA insert in closer proximity tothe genomic cleavage site, one can increase integration efficiency oflarge DNA fragments into the genome, be able to use lower linear DNAconcentrations without sacrificing efficiency, and quickly screenthrough various protein configurations due to cell culture system. Anice biological (fluorescent) readout allows for quick optimization.

To aid development of a quick and easy analysis of CRISPR/Cas9optimization, a GFP:B-actin reporter system is developed includingStep 1) Building Cas9:CCR5:NLSfusion protein; Step 2) Synthesize sgRNAagainst human genomic B-actin; Step 3) Design suitable linear GFP DNAsequence optimized for efficient DNA repair, Step 4) Screen for greencells and optimize if needed.

Example 2 Study Model

GFP-βactin is our first genomic GFP-tagged goal protein for thefollowing reasons: it expresses constantly and one can screen cellsquickly, any frame shift and/or cleavage leading to knocking out actinresults in cell death. The two fluorescent labels, mCherry and GFP, arethe means of screening integration efficiency and cell viability. GFPcan only express if it is integrated into the genome and is in-framewith surrounding sequence. The mCherry should express in all livingtransfected cells.

The experimental conditions included: 1) mCherry only (transfectionefficiency/cell viability control); 2) Cas-9-nls, sgRNA, donor DNA,mCherry (NHEJ efficiency); 3) Donor DNA, sgRNA, mCherry (cell viabilitydue to linear DNA); 4) Cas-9-ZF, donor DNA, mCherry (off-target CCR5effects); 5) Cas-9-ZF, sgRNA, donor DNA, mCherry (test).

Fluorescently labeled GFP signal allows rapid determination of cellviability due to the constituent reagents in the transfection in orderto optimize future electroporations. GFP signal will only come fromcells positive for integration and colonies can be isolated, collected,expanded, genotyped to determine GFP insertion sites, and imaged todetermine proper protein localization.

Example 3 Methods

A vector was purchased from Addgene encoding Cas-9-NLS (Plasmid #42251).The C-terminal NLS was removed from the 3′ end of the DNA sequence andreplaced with the coding sequence for a short 3× glycine-serine linker(6 amino acids total), the left-handed CCR5, and the NLS previouslyremoved was added back. The vector was linearized and used as templatefor a T7 MMESSAGE® capped mRNA synthesis kit. The RNA is cleaned upusing an RNeasy kit followed by phenol/chloroform extraction and ethanolprecipitation.

Guide-strand RNA is synthesized using the following protocol. A GFP-genefusion (GFP-βactin) was published as a functionally viable protein andtherefore became the desired genomic GFP-tagged endogenous protein. PCRprimers were designed to amplify sequence surrounding this region fortargeted insertion within the HEK293 cell line and the PCR product wassequenced to confirm accuracy of the published chromosomal sequencedata. 20 bases of sequence within the genome are chosen as thehybridization target for the guide-strand RNA and run through severalalgorithms on-line to both maximize guide-strand hybridization to thegenome and to minimize off-target hybridizations. After a sequence isselected, a 120 bp DNA template is assembled by Klenow fill-in and PCRamplification from 4 ssDNA oligo nucleotides ordered from IDT, only oneof which is unique to each guide-strand (i.e. only one oligo, $18, mustbe purchased for a unique sgRNA). The template consists of a T7promoter, 20 bases of unique homology to the genomic target, and 80bases that will encode the Cas9-binding hairpins. The recipe is asfollows: 5 pmoles DNA template, 125 uM NTP mix, transcription buffer(HEPES-MgCl₂, DTT, spermidine, pH 7.5), 150 ug T7 RNA pol (20 ul, madein house) in a total volume of 200 ul. The reaction is incubated at 37°C. for 2-4 hours until a white precipitate of Mg²⁺ (pyrophosphate)collects at the bottom of the tube. 25 ul of 0.5M EDTA is added to clearthe precipitate and halt further polymerization. The newly synthesizedRNA is run out on an 8% acrylamide Urea-PAGE gel to separateunincorporated NTPs, truncated RNAs, and DNA template from thefull-length 100base sgRNA. The RNA is visualized via shadow imaging overa TLC plate, the bands are cut out from the gel and eluted from the gelby electrophoresis on a Whatman elutrap. The RNA is then ethanolprecipitated and dissolved in 10 mM Tris pH 8.0 to the desiredconcentration.

After some test integrations, it was determined that the donor DNAsequence should consist of at least 80 bases of exact sequence homologyboth upstream and downstream of the 20 bases required by Cas-9 to causethe double stranded DNA break. These 80 bp up- and 80 bp downstream areengineered on the 5′ and 3′ ends of the donor DNA sequence that are tobe added into the genome. It is significant to note that the sequencingdata used to determine the optimal sgRNA hybridization sequence is alsouseful to generate the precise homology arms. The 12 base pairsconstituting the left-handed CCR5-zinc finger-binding site are addedstrategically as to not interfere with either the final coding sequenceor splicing of the mRNA transcribed from the target locus. Super-folderGFP (sfGFP), mCerulean, and tagRFP were codon optimized, through silentmutations, such that any consensus mRNA splice donor and splice acceptorsequences were removed from both sense and antisense strands. The βactinhomology domains and CCR5-binding site were extended in both directionsfrom sfGFP in PCR reactions. The linear DNA was purified from an LMPagarose gel, extracted using a Qiagen Gel Extraction Kit,phenol/chloroform extracted, ethanol precipitated, and dissolved in TEbuffer (10 mM Tris, 1 mM EDTA, pH 8.0) to the desired concentration.

Because of the mixed nature of the various components (100base sgRNA,4500base Cas-9-ZF mRNA, 6500 bp Ubi6::mCherry plasmid as transfectioncontrol, 890 bp linear donor DNA) going into the transfection, it wasdetermined that electroporation was the delivery method of choice asopposed to lipophilic or receptor mediated transfection reagents. 1×10⁶cells are suspended in 1×PBS containing: 2 ug sgRNA, 1.5 ug Cas-9-ZFmRNA, 50 ng donor DNA, and 1 ug plasmid DNA in 40 ul total volume. Thecells are electroporated in a square wave, 42V, 50 ms, pulse supplied byHarvard Apparatus BTX-840. The cells are then plated on 60 mm glassbottom cell culture dishes for ease of imaging, analysis, and clonalharvesting. HEK293 cells are grown in DMEM with 5% FBS and gentamicin.The CRISPR/Cas mRNA or protein, sgRNA, and donor DNA mixture is injectedinto zebrafish embryos at 10-30 minutes post fertilization to make ourzebrafish transgenic lines.

Example 4 Preliminary Results

As shown in FIG. 2 , a wild-type condition resulted in only 5 out of 18CFP positive colonies that were also GFP positive, 2 of which were onlyGFP positive (i.e., stable integrants). By contrast, using the describeCas-9 zinc finger suste, 9 out of 12 CFP positive colonies that werealso GFP positive, 4 of which were only GFP positive. In FIG. 3 ,results demonstrate that addition of the Zinc-finger to Cas9 does notconfer additional zinc-finger dependent endonuclease activity (FIG. 3A)and three examples of GFP signal due to integration of donor DNA areshown (FIG. 3B). The GFP positive clones are being validated (FIG. 4 ).mCherry is the internal viability control. This work was done inHEK-293t cells to generate GFP fused to endogenous beta actin.

This proof-of concept of labeling endogenous proteins in stem cells canreadily be translated to generate transgenic animals quickly. Theability to recruit donor DNA, via the Cas-9:CCR5 zinc finger, to thebreak site of Cas-9 increases the integration of foreign DNA (723 basepairs, GFP) 2.7 fold over wildtype Cas-9 in this first iteration. Byadding more genomic sequence on either side of the donor DNA one shouldbe able to get higher integration efficiencies.

Example 5 Subsequent Projects and Applications

Once proper conditions for integration are worked out, one cangenetically tag Oct4 with tagRFP, Sox2 with CFP, and Nanog with sfGFP inseparate mouse ES cell lines, and combine them in all possiblepermutations. These cell lines will be useful in their own right but areuseful to generate transgenic mouse lines where we are able to visualizeabsolute amounts and localizations of these developmentally importantproteins expressed as endogenous levels of fusion proteins.

Additionally, one can skip in vivo translation of Cas-9 mutant bybacterial expression and subsequently purify Cas-9 fusion protein. Thispurified protein would be premixed and equilibrated with both guidestrand RNA and donor DNA necessary for all unique genomic integrations.These preassembled endonuclease/donor DNA integration units could beinjected or transfected into embryos or any cell line. Thisconfiguration allows not only genomic editing at the earliest stages butalso would allow multiple integration events simultaneously with highefficiency. Such would be the first successful generation of a mouse ESstem cell line with all three of the previously described integrations,simultaneously, as the first multiple knock-in lines. In other examples,one could also target RNA probes such as Spinach (1 or 2) into thegenome to get visual read outs of nascent transcription and/or mRNAsplicing events.

Example 6 Construct Formats

Additionally, one could potentially combine all of the elements of thisCRISPR, Cas-9:fusion, donor DNA-targeting system into a single virus.The attenuated virus would express Cas-9:fusion, and guide strand RNAunder an orthogonally induced promoter (i.e. ecdysone receptor-drivenexpression would work in all species except insect) or celltype-specific promoter, and the viral genome would contain the donor DNAsequence flanked on both sides by the 20 base pairs recognized by theguide strand RNA. This configuration would generate linear donor DNAwith the requisite flanking homology sequence to the genomic target andwould be recruited by Cas-9:fusion-sgRNA endonuclease to the targetlocus. This viral construct could be used to: correct genetic mutations,make drugable targets more sensitive to drugs, or make previouslynon-drugable targets drugable, and would self-destruct in the process.

Example 7 Exemplary Sequences Used in Applications

A variety of constructs were prepared to test the variable designsdescribed herein. In on example, a Cas-9 zinc-finger fusion protein wasgenerated, including the following elements: the Cas-9 endonuclease, ashort 9 amino acid linker (ssagagaga, SEQ ID NO:9), left-handed CCR5zinc-finger, 4 amino acids (wrlp, SEQ ID NO:10), and a nuclearlocalization sequence, stop. Nucleotide sequence is described in SEQ IDNO:1 and amino acid sequence of the construct is described in SEQ IDNO:2. For Ni-NTA purification of the Cas-9 fusion protein, the Inventorsadded a 6× histidine tag to the extreme carboxy-terminal end of theprotein.

Other donor DNA sequence examples used or synthesized include asuper-folder GFP: Beta-actin (SEQ ID NO:3) utilized in the human cancercell line HEK293. This sequence includes 80 base pairs of sequencehomology upstream of the Cas-9 endonuclease cleavage site, left-handedCCR5 binding sequence, 12 bases encoding a short linker, super-folderGFP, and 80 base pairs of sequence homology downstream of theendonuclease cleavage site.

Another example includes a Sox-2:PS-mOrange2:Spinach2 SEQ ID NO:4 foruse in zebrafish (D. rerio). 81 base pairs of sequence homology upstreamof the Cas-9 endonuclease cleavage site, silent mutations to the Sox-2carboxy-terminal end, 48 bases coding for 16 glycine/serine residues,photoswitchable orange fluorescent protein (PS-mOrange2), 144 base pairsnon-coding, left-handed CCR5 binding sequence, Spinach-2 (mRNAfluorescent reporter), and 82 base pairs of sequence homology downstreamof the endonuclease cleavage site.

Additional example includes Oct-4:GFP (SEQ ID NO:5) for use in mouse (M.musculus). This sequence includes 82 base pairs of sequence homologyupstream of the Cas-9 endonuclease cleavage site, 27 bases encoding aglycine/alanine linker, super-folder GFP, left-handed CCR5 bindingsequence, and 94 base pairs of sequence homology downstream of theendonuclease cleavage site.

Finally, a variety of guide-strand RNA sequences were utilized includinghuman beta-Actin (SEQ ID NO:6), zebrafish Sox-2 (SEQ ID NO:7), and mouseOct-4 (SEQ ID NO:8).

Example 8 Experimental Protocol

An exemplary experimental protocols for applying the above constructs isdescribed further herein:

-   -   All cas9 protein/sgRNA/donor DNA complexes are allowed to        equilibrate at room temp for 30 minutes.    -   Human B-actin sgRNA and donor DNA can used    -   A “1:2:3” mixture strategy is utilized (e.g. 1 pmol plasmid: 2        pmol sgRNA: 3 pmol protein: 3 pmol of sgRNA) with 90 ng (0.013        pmol) plasmid DNA. These conditions are optimized for minimal        protein usage and for clear read-out on the gel.    -   Cas9 reactions are carried out at 28 degrees ° C. for two hours.    -   XmaI digest (single cut) is utilized, two hour reaction at 37        degrees C.    -   Samples were cleaned up with qiagen PCR clean-up        reagents/columns prior to gel electrophoresis.    -   Samples were run on a 1% agarose gel in TAE.

The above protocol demonstrates that Cas9-zinc finger fusion proteinlinearizes plasmid DNA efficiently (FIG. 5 ). It appears that someoff-target cleavage may occur due to the presence of the zinc finger.However, off-target cleavage is ameliorated when donor DNA is included.The above approach has allowed for optimized conditions for injectionsinto zebrafish, as well as in vitro confirmation that the B-actin guidestrand and cas-9 fusion are functional together. Further confirmation isavailable based on running flow cytometry cell sorting (FACS) on frozenGFP:B-actin HEK cells. Once a GFP sorted culture is prepared, genomicanalysis can confirm editing.

Example 9 All-In-One CRISPR Editing

In an alternative embodiment, mixtures of specific guides strandRNA-donor DNA hybrids could allow for parallel, multi-locus mutations.More specifically, mixtures of specific guide strand RNA-donor DNAhybrids and a single CRISPR protein preparation are deployed. Thisapproach would allow a researcher to make simultaneousadditions/mutations to multiple loci in the genome of any organism orcell line.

By using Cpf1 instead of Cas9 for CRISPR genome editing, this allows oneto take advantage of highly efficient sticky-end donor DNA-genomic DNAligation repairs. A partial unnatural nucleotide backbone in the donorDNA assembly primers make the sticky-end of the donor DNA lessvulnerable to degradation, which preservation theoretically will allowfor much higher ligation efficiencies.

Parallel, multi-locus, genetic mutations for developing disease orresearch models for any organism can be made and allow the simultaneousediting of any number of target genes just by adding additionalguide-strand RNA/donor DNA fusions.

While the above is described using the CRISPR/cpf1 system, as efficientfor these purposes, the above approach appears compatible with a varietyof current or future RNA-dependent endonucleases such as cas9 amongothers.

The guide strand/donor fusion is assembled as shown in FIG. 6 . Morespecifically,

-   -   1) T7 RNA synthesis of guide-strand RNA (FIG. 6A)    -   2) Splint (FIG. 6B) mediated ligation (FIG. 6C) of        single-stranded donor DNA sequence (FIG. 6D) onto the 5′ end of        the guide-strand RNA. The 5′ nucleotide of the synthesized donor        DNA (FIG. 6E) has an exonuclease resistant phosphorothioate bond        with its neighboring nucleotide.    -   3) Hybridization of double-stranded DNA terminating        oligonucleotide (FIG. 6F).    -   4) Isothermal DNA polymerase (Klenow exo-) reaction to fill-in        the double strandedness (FIG. 6G) from the splinting primer        (step 2) to the terminating oligo (step 3) and T4-DNA ligase        reaction (FIG. 6H) to complete the donor fragment with the        appropriate 5′ sticky end (FIG. 6I) for donor DNA sticky end        ligation to the sticky end of the genomic DNA (FIG. 6J) digested        by cpf1.    -   5) Complete integration happens upon ligation of the sticky end        (FIG. 6K) of the donor DNA and homologous recombination between        donor and genomic DNA (FIG. 6 ) due to their proximity and        sequence homology.        Given the fact that each sgRNA/donor is unique and confers both        digestion specificity and the specific donor DNA mutation for        the targeted gene, one can combine these hybrid nucleotides with        a single protein preparation and get multiple targeted mutations        simultaneously.

The various methods and techniques described above provide a number ofways to carry out the invention. Of course, it is to be understood thatnot necessarily all objectives or advantages described may be achievedin accordance with any particular embodiment described herein. Thus, forexample, those skilled in the art will recognize that the methods can beperformed in a manner that achieves or optimizes one advantage or groupof advantages as taught herein without necessarily achieving otherobjectives or advantages as may be taught or suggested herein. A varietyof advantageous and disadvantageous alternatives are mentioned herein.It is to be understood that some preferred embodiments specificallyinclude one, another, or several advantageous features, while othersspecifically exclude one, another, or several disadvantageous features,while still others specifically mitigate a present disadvantageousfeature by inclusion of one, another, or several advantageous features.

Furthermore, the skilled artisan will recognize the applicability ofvarious features from different embodiments. Similarly, the variouselements, features and steps discussed above, as well as other knownequivalents for each such element, feature or step, can be mixed andmatched by one of ordinary skill in this art to perform methods inaccordance with principles described herein. Among the various elements,features, and steps some will be specifically included and othersspecifically excluded in diverse embodiments.

Although the invention has been disclosed in the context of certainembodiments and examples, it will be understood by those skilled in theart that the embodiments of the invention extend beyond the specificallydisclosed embodiments to other alternative embodiments and/or uses andmodifications and equivalents thereof.

Many variations and alternative elements have been disclosed inembodiments of the present invention. Still further variations andalternate elements will be apparent to one of skill in the art. Amongthese variations, without limitation, are the compositions for, andmethods of, genetic editing, in vivo methods associated with geneticediting, compositions of cells generated by the aforementionedtechniques, treatment of diseases and/or conditions that relate to theteachings of the invention, techniques and composition and use ofsolutions used therein, and the particular use of the products createdthrough the teachings of the invention. Various embodiments of theinvention can specifically include or exclude any of these variations orelements.

In some embodiments, the numbers expressing quantities of ingredients,properties such as concentration, reaction conditions, and so forth,used to describe and claim certain embodiments of the invention are tobe understood as being modified in some instances by the term “about.”Accordingly, in some embodiments, the numerical parameters set forth inthe written description and attached claims are approximations that canvary depending upon the desired properties sought to be obtained by aparticular embodiment. In some embodiments, the numerical parametersshould be construed in light of the number of reported significantdigits and by applying ordinary rounding techniques. Notwithstandingthat the numerical ranges and parameters setting forth the broad scopeof some embodiments of the invention are approximations, the numericalvalues set forth in the specific examples are reported as precisely aspracticable. The numerical values presented in some embodiments of theinvention may contain certain errors necessarily resulting from thestandard deviation found in their respective testing measurements.

In some embodiments, the terms “a” and “an” and “the” and similarreferences used in the context of describing a particular embodiment ofthe invention (especially in the context of certain of the followingclaims) can be construed to cover both the singular and the plural. Therecitation of ranges of values herein is merely intended to serve as ashorthand method of referring individually to each separate valuefalling within the range. Unless otherwise indicated herein, eachindividual value is incorporated into the specification as if it wereindividually recited herein. All methods described herein can beperformed in any suitable order unless otherwise indicated herein orotherwise clearly contradicted by context. The use of any and allexamples, or exemplary language (e.g. “such as”) provided with respectto certain embodiments herein is intended merely to better illuminatethe invention and does not pose a limitation on the scope of theinvention otherwise claimed. No language in the specification should beconstrued as indicating any non-claimed element essential to thepractice of the invention.

Groupings of alternative elements or embodiments of the inventiondisclosed herein are not to be construed as limitations. Each groupmember can be referred to and claimed individually or in any combinationwith other members of the group or other elements found herein. One ormore members of a group can be included in, or deleted from, a group forreasons of convenience and/or patentability. When any such inclusion ordeletion occurs, the specification is herein deemed to contain the groupas modified thus fulfilling the written description of all Markushgroups used in the appended claims.

Preferred embodiments of this invention are described herein, includingthe best mode known to the inventor for carrying out the invention.Variations on those preferred embodiments will become apparent to thoseof ordinary skill in the art upon reading the foregoing description. Itis contemplated that skilled artisans can employ such variations asappropriate, and the invention can be practiced otherwise thanspecifically described herein. Accordingly, many embodiments of thisinvention include all modifications and equivalents of the subjectmatter recited in the claims appended hereto as permitted by applicablelaw. Moreover, any combination of the above-described elements in allpossible variations thereof is encompassed by the invention unlessotherwise indicated herein or otherwise clearly contradicted by context.

Furthermore, numerous references have been made to patents and printedpublications throughout this specification. Each of the above citedreferences and printed publications are herein individually incorporatedby reference in their entirety.

In closing, it is to be understood that the embodiments of the inventiondisclosed herein are illustrative of the principles of the presentinvention. Other modifications that can be employed can be within thescope of the invention. Thus, by way of example, but not of limitation,alternative configurations of the present invention can be utilized inaccordance with the teachings herein. Accordingly, embodiments of thepresent invention are not limited to that precisely as shown anddescribed.

1. A composition comprising a vector encoding a fusion proteincomprising at least one endonuclease and a DNA binding moiety.
 2. Thecomposition of claim 1, wherein the fusion protein comprises at leastone endonuclease selected from the group consisting of: cas regularlyinterspaced short palindromic (CRISPR) protein, a zinc finger nuclease(ZFNs) and transcription activator-like effector nucleases (TALENs). 3.The composition of claim 2, wherein the CRISPR protein comprises cas9.4. The composition of claim 1, wherein the DNA binding moiety comprisesa zinc finger protein.
 5. The composition of claim 4, wherein the zincfinger comprises a left handed CCR5 binding protein.
 6. The compositionof claim 1, wherein the at least one endonuclease and DNA binding moietyare joined by a linker comprising two, three, four, five, six, seven,eight, nine, ten or more amino acids.
 7. The composition of claim 1,wherein the fusion protein comprises a fluorescent labeled protein. 8.The composition of claim 7, wherein the fluorescent labeled proteincomprises one or more proteins selected from the group consisting of:green fluorescent protein (GFP), enhanced (eGFP), red fluorescentprotein (RFP) and mCherry.
 9. The composition of claim 1, wherein thefusion protein comprises a nuclear localization signal (NLS).
 10. Thecomposition of claim 9, wherein the NLS is SV40 NLS. 11-18. (canceled)19. A kit for genomic editing comprising: one or more vectors encoding afusion protein comprising at least one endonuclease and a DNA bindingmoiety; and template DNA comprising at least one expression cassette,two flanking sequences, and a DNA binding moiety sequence.
 20. The kitof claim 19, further comprising one or more guide RNAs (gRNAs).
 21. Thekit of claim 19, wherein the fusion protein comprises at least oneendonuclease CRISPR protein, a DNA binding moiety that is a zinc fingerprotein.
 22. The kit of claim 21, further comprising a fluorescentlabeled protein.
 23. The kit of claim 21, further comprising a nuclearlocalization signal (NLS).